智能论文笔记

Plant Disease Detection Using Image Processing and Machine Learning

Pranesh Kulkarni , Atharva Karwande , Tejas Kolhe , Soham Kamble , Akshay Joshi , Medha Wyawahare

分类：计算机视觉 | 人工智能 | 机器学习

2021-06-20

农业实践中的一个重要和繁琐的任务之一是检测作物疾病。它需要巨大的时间和熟练的劳动力。本文提出了一种智能有效的方法，用于检测使用计算机视觉和机器学习技术的作物疾病。该拟议的系统能够检测5种常见植物的20个不同疾病，精度为93％。

translated by 谷歌翻译

In and Out-of-Domain Text Adversarial Robustness via Label Smoothing

Yahan Yang , Soham Dan , Dan Roth , Insup Lee

分类：自然语言处理 | 机器学习

2022-12-20

Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While several defense techniques have been proposed, and adapted, to the discrete nature of text adversarial attacks, the benefits of general-purpose regularization methods such as label smoothing for language models, have not been studied. In this paper, we study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks in both in-domain and out-of-domain settings. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.

translated by 谷歌翻译

Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

Yigit Efe Erginbas , Soham Phade , Kannan Ramchandran

分类：机器学习

2022-12-13

Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.

translated by 谷歌翻译

Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

Shen Yan , Tao Zhu , Zirui Wang , Yuan Cao , Mi Zhang , Soham Ghosh , Yonghui Wu , Jiahui Yu

分类：计算机视觉 | 机器学习

2022-12-09

This work explores an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. We present VideoCoCa that reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, we surprisingly find that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to ``flattened frame embeddings'', yielding a strong zero-shot transfer baseline for many video-text tasks. Specifically, the frozen image encoder of a pretrained image-text CoCa takes each video frame as inputs and generates \(N\) token embeddings per frame for totally \(T\) video frames. We flatten \(N \times T\) token embeddings as a long sequence of frozen video representation and apply CoCa's generative attentional pooling and contrastive attentional pooling on top. All model weights including pooling layers are directly loaded from an image-text CoCa pretrained model. Without any video or video-text data, VideoCoCa's zero-shot transfer baseline already achieves state-of-the-art results on zero-shot video classification on Kinetics 400/600/700, UCF101, HMDB51, and Charades, as well as zero-shot text-to-video retrieval on MSR-VTT and ActivityNet Captions. We also explore lightweight finetuning on top of VideoCoCa, and achieve strong results on video question-answering (iVQA, MSRVTT-QA, MSVD-QA) and video captioning (MSR-VTT, ActivityNet, Youcook2). Our approach establishes a simple and effective video-text baseline for future research.

translated by 谷歌翻译

Performance Evaluation of Vanilla, Residual, and Dense 2D U-Net Architectures for Skull Stripping of Augmented 3D T1-weighted MRI Head Scans

Anway S. Pimpalkar , Rashmika K. Patole , Ketaki D. Kamble , Mahesh H. Shindikar

分类：计算机视觉

2022-11-29

Skull Stripping is a requisite preliminary step in most diagnostic neuroimaging applications. Manual Skull Stripping methods define the gold standard for the domain but are time-consuming and challenging to integrate into pro-cessing pipelines with a high number of data samples. Automated methods are an active area of research for head MRI segmentation, especially deep learning methods such as U-Net architecture implementations. This study compares Vanilla, Residual, and Dense 2D U-Net architectures for Skull Stripping. The Dense 2D U-Net architecture outperforms the Vanilla and Residual counterparts by achieving an accuracy of 99.75% on a test dataset. It is observed that dense interconnections in a U-Net encourage feature reuse across layers of the architecture and allow for shallower models with the strengths of a deeper network.

translated by 谷歌翻译

Audio Retrieval with WavText5K and CLAP Training

Soham Deshmukh , Benjamin Elizalde , Huaming Wang

分类：人工智能

2022-09-28

音频文本检索需要自然语言查询以在数据库中检索相关的音频文件。相反，文本审计检索将音频文件作为查询来检索相关的自然语言描述。大多数带有一个音频字幕数据集的文献训练检索系统，但是评估多个数据集培训的好处是没有充满反感的。此外，检索系统必须学习描述从几秒钟到几秒钟的可变长度的音频内容之间的详细句子之间的对齐。在这项工作中，我们提出了一个新的Web音频文本对以及一个新的检索框架。首先，我们提供了大约五千个Web音频纹理对的新集合，我们称为WavText5k。当用来训练我们的检索系统时，WavText5K比其他音频字幕更多地提高了性能。其次，我们的框架学会了使用文本编码器，两个音频编码器和对比度学习目标来连接语言和音频内容。组合两个音频编码器有助于处理可变长度音频。这两个贡献超过了AudioCaps和Clote的Text-Audio检索的最新表现，相对2％和16％，而音频检索则达到6％和23％。

translated by 谷歌翻译

Adapting Task-Oriented Dialogue Models for Email Conversations

Soham Deshmukh , Charles Lee

分类：自然语言处理

2022-08-19

意图检测是对话助手的任何自然语言理解（NLU）系统的关键部分。对于存在多个指令和意图的电子邮件对话，检测正确的意图是必不可少的，但很难。在这种设置中，对话上下文可以成为检测助手的用户请求的关键歧义因素。合并上下文的一种突出方法是建模过去的对话历史，例如以任务为导向的对话模型。但是，电子邮件对话的性质（长形式）限制了直接使用面向任务的对话模型中最新进展。因此，在本文中，我们提供了一个有效的转移学习框架（EMTOD），该框架允许对话模型中的最新开发方式用于长形式的对话。我们表明，提出的EMTOD框架将预训练的语言模型的意图检测性能提高了45％，而预先培训的对话模型则提高了30％，以实现任务为导向的电子邮件对话。此外，提出的框架的模块化性质允许在预训练的语言和面向任务的对话模型中为未来的任何发展提供插件。

translated by 谷歌翻译

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang , Andrew Williams , Soham Phade , Sunil Srinivasa , Yang Zhang , Prateek Gupta , Yoshua Bengio , Stephan Zheng

分类：机器学习

2022-08-15

全球综合合作对于限制全球温度的升高至关重要，同时继续经济发展，例如减少严重的不平等或实现长期经济增长。与N战略代理进行缓解气候变化的长期合作提出了一个复杂的游戏理论问题。例如，代理商可以谈判并达成气候协议，但是没有中央权力可以执行遵守这些协议。因此，设计谈判和协议框架以促进合作，允许所有代理人达到其个人政策目标并激励长期遵守，这一点至关重要。这是一个跨学科的挑战，要求在机器学习，经济学，气候科学，法律，政策，道德和其他领域进行研究人员之间的合作。特别是，我们认为机器学习是解决该领域复杂性的关键工具。为了促进这项研究，在这里，我们介绍了一个多区域综合评估模型，模拟全球气候和经济，可用于设计和评估不同谈判和协议框架的战略成果。我们还描述了如何使用多代理增强学习来使用水稻N训练理性剂。该框架是全球气候合作的基础，这是一个工作组协作和气候谈判和协议设计的竞争。在这里，我们邀请科学界使用Rice-N，机器学习，经济直觉和其他领域知识来设计和评估其解决方案。更多信息可以在www.ai4climatecoop.org上找到。

translated by 谷歌翻译

Comparison of Deep Learning and Machine Learning Models and Frameworks for Skin Lesion Classification

Soham Bhosale

分类：计算机视觉

2022-07-26

皮肤癌的发病率在全世界一直在稳步上升，这是一个严重的问题。早期诊断有可能大大减少疾病造成的伤害，但是，传统活检是一种劳动密集型和侵入性的手术。此外，许多农村社区不容易获得医院，并且不希望因为他们认为可能是小问题而访问一个。使用机器学习和深度学习进行皮肤癌分类可以提高可及性，并减少传统病变检测过程中涉及的不适程序。这些模型可以包裹在网络或移动应用程序中，并为更多的人口提供服务。在本文中，在常见皮肤病变的基准HAM10000数据集上测试了两个这样的模型。它们是带有分层k折的随机森林，并且是Mobilenetv2（在其余的论文中称为Mobilenet）。使用Tensorflow和Pytorch框架分别训练Mobilenet模型。深度学习和机器学习模型的并排比较，以及对在资源约束的移动环境中针对皮肤病变诊断的不同框架的相同深度学习模型的比较。结果表明，这些模型中的每一个在不同的分类任务上都更好。为了获得更大的总回忆，准确性和恶性黑色素瘤的检测，张量流动性是更好的选择。但是，为了检测非癌性皮肤病变，Pytorch Mobilenet被证明更好。当涉及到中等正确性的计算成本低时，随机森林是更好的算法。

translated by 谷歌翻译

Human-guided Collaborative Problem Solving: A Natural Language based Framework

Harsha Kokel , Mayukh Das , Rakibul Islam , Julia Bonn , Jon Cai , Soham Dan , Anjali Narayan-Chen , Prashant Jayannavar , Janardhan Rao Doppa , Julia Hockenmaier

分类：人工智能 | 自然语言处理

2022-07-19

我们将人机协作问题解决的问题视为一项计划任务，再加上自然语言交流。我们的框架由三个组成部分组成 - 一种自然语言引擎，将语言话语解析为正式代表，反之亦然，这是一个概念学习者，该概念学习者基于与用户的有限互动来诱导计划的广义概念，以及解决方案的HTN规划师，以解决该计划。基于人类互动的任务。我们说明了该框架通过在基于Minecraft的Blocksworld域中的协作构建任务中证明协作问题解决的关键挑战的能力。随附的演示视频可在https://youtu.be/q1pwe4aahf0上获得。

translated by 谷歌翻译